pacman::p_load(sf, tmap, tidyverse, sfdep, ggplot2)Take Home Exercise 2
1. Overview
1.1 Introduction
Drug abuse is associated with significant negative health, financial and social consequences. Yet, illicit drug consumption remains highly prevalent and continues to be a growing problem worldwide. In 2021, 1 in 17 people aged 15–64 in the world had used a drug in the past 12 months. Notwithstanding population growth, the estimated number of drug users grew from 240 million in 2011 to 296 million in 2021.
The geopolitics of Thailand which is near the Golden Triangle of Indochina, the largest drug production site in Asia, and the constant transportation infrastructure development made Thailand became market and transit routes for drug trafficking to the third countries.
In Thailand, drug abuse is one of the major social issue. There are about 2.7 million youths using drugs in Thailand. Among youths aged between 15 and 19 years, there are about 300,000 who have needs for drug treatment. Most of Thai youths involved with drugs are vocational-school students, which nearly doubles in number compared to secondary-school students.
1.2 Goal
In this exercise, we are going to achieve the following tasks:
Using appropriate function of sf and tidyverse, preparing the following geospatial data layer:
a study area layer in sf polygon features. It must be at province level (including Bangkok) of Thailand.
a drug abuse indicators layer within the study area in sf polygon features.
Using the extracted data, perform global spatial autocorrelation analysis by using sfdep methods.
Using the extracted data, perform local spatial autocorrelation analysis by using sfdep methods.
Describe the spatial patterns revealed by the analysis above.
1.3 Importing of Packages
Before we start off, we will have to import the necessary packages required for us to conduct our analysis.
We will be using the following packages:
sf: Manages spatial vector data, enabling operations like spatial joins, buffering, and transformations for points, lines, and polygons.tmap: Creates and customizes thematic maps for spatial data visualization, including static and interactive maps with various map elements.tidyverse: A suite of packages for data manipulation (dplyr), visualization (ggplot2), and tidying (tidyr), facilitating a streamlined workflow for data analysis.ggplot2: A powerful data visualization package within the tidyverse, allowing the creation of complex and customizable graphs. It supports a wide range of geospatial plotting when combined withsfobjects, enabling seamless integration of spatial data in layered plots like points, polygons, and lines.
sfdep: Facilitates spatial dependence analysis by providing tools to compute spatial weights, autocorrelation statistics, and other spatial econometric measures. It integrates with thesfpackage, allowing users to easily conduct spatial analysis on spatial data frames.
1.4 Dataset
For the purpose of this take-home exercise, two data sets shall be used, they are:
Thailand Drug Offenses [2017-2022] at Kaggle.
Thailand - Subnational Administrative Boundaries at HDX. You are required to use the province boundary data set. Thailand administrative level 0 (country), 1 (province), 2 (district), and 3 (sub-district, tambon) boundaries
2. Data Wrangling
2.1 Importing Thailand Drug Offenses
This line of code imports the Thailand Drug Offenses dataset from Kaggle while using the select() function to remove the province_th column which is not useful to us. For this dataset, we have a total of 7392 rows.
drug_offenses_sf <- read_csv("data/aspatial/drug_offence/thai_drug_offenses_2017_2022.csv") %>% select(-province_th)summary(drug_offenses_sf) fiscal_year types_of_drug_offenses no_cases province_en
Min. :2017 Length:7392 Min. : 0.0 Length:7392
1st Qu.:2018 Class :character 1st Qu.: 1.0 Class :character
Median :2020 Mode :character Median : 70.0 Mode :character
Mean :2020 Mean : 535.3
3rd Qu.:2021 3rd Qu.: 623.0
Max. :2022 Max. :17131.0
2.2 Importing Thailand - Subnational Administrative Boundaries
This line of code imports the Thailand - Subnational Administrative Boundaries dataset using st_read(). As shown, there is a total of 77 geographic features, representing the 76 provinces in Thailand as well as one additional special administrative area (Bangkok).
thailand_province_sf <- st_read(dsn="data/geospatial/tha_adm_rtsd_itos_20210121_shp/",
layer="tha_admbnda_adm1_rtsd_20220121") %>%
select(ADM1_EN, geometry) Reading layer `tha_admbnda_adm1_rtsd_20220121' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/Take-home_Ex/Take-home_Ex02/data/geospatial/tha_adm_rtsd_itos_20210121_shp'
using driver `ESRI Shapefile'
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS: WGS 84
This line of code checks if all the polygons are valid.
st_is_valid(thailand_province_sf) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE
This line of code displays the study area.
tm_shape(thailand_province_sf) +
tm_polygons() 2.3 Data Cleansing & Extraction
2.3.1 Extracting Data
To see the different types of drug offenses currently in the dataset, we can use unique(). From this we can see that there a total of 16 types of different offenses.
unique(drug_offenses_sf$types_of_drug_offenses) [1] "drug_use_cases"
[2] "suspects_in_drug_use_cases"
[3] "possession_cases"
[4] "suspects_in_possession_cases"
[5] "possession_with_intent_to_distribute_cases"
[6] "suspects_in_possession_with_intent_to_distribute_cases"
[7] "trafficking_cases"
[8] "suspects_in_trafficking_cases"
[9] "production_cases"
[10] "suspects_in_production_cases"
[11] "import_cases"
[12] "suspects_in_import_cases"
[13] "export_cases"
[14] "suspects_in_export_cases"
[15] "conspiracy_cases"
[16] "suspects_in_conspiracy_cases"
However, since the study area for this exercise is on drug abuse specifically, we will only be focusing on drug use cases which we will use filter() to do so, limiting the scope of this project to only that.
drug_use_sf <-drug_offenses_sf%>%filter(types_of_drug_offenses=="drug_use_cases")
summary(drug_use_sf) fiscal_year types_of_drug_offenses no_cases province_en
Min. :2017 Length:462 Min. : 32.0 Length:462
1st Qu.:2018 Class :character 1st Qu.: 798.2 Class :character
Median :2020 Mode :character Median : 1403.5 Mode :character
Mean :2020 Mean : 1981.7
3rd Qu.:2021 3rd Qu.: 2440.2
Max. :2022 Max. :16480.0
2.3.2 Performing relational join
2.3.2.1 Failed Attempt
To fix and standardize the data for consistent matching, we will use toupper() to ensure the columns ADM1_EN of thailand_province_sf and province_en of drug_use_sf can match correctly.
drug_use_sf$province_en <- toupper(drug_use_sf$province_en)
thailand_province_sf$ADM1_EN <- toupper(thailand_province_sf$ADM1_EN)And to ensure that the changes are done correctly, we can use head() for this. As seen here, the province names are all now in upper cases.
head(drug_use_sf)# A tibble: 6 × 4
fiscal_year types_of_drug_offenses no_cases province_en
<dbl> <chr> <dbl> <chr>
1 2017 drug_use_cases 11871 BANGKOK
2 2017 drug_use_cases 200 CHAI NAT
3 2017 drug_use_cases 553 NONTHABURI
4 2017 drug_use_cases 450 PATHUM THANI
5 2017 drug_use_cases 378 PHRA NAKHON SI AYUTTHAYA
6 2017 drug_use_cases 727 LOBURI
head(thailand_province_sf)Simple feature collection with 6 features and 1 field
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 100.1913 ymin: 13.47842 xmax: 100.9639 ymax: 14.80246
Geodetic CRS: WGS 84
ADM1_EN geometry
1 BANGKOK MULTIPOLYGON (((100.6139 13...
2 SAMUT PRAKAN MULTIPOLYGON (((100.7306 13...
3 NONTHABURI MULTIPOLYGON (((100.3415 14...
4 PATHUM THANI MULTIPOLYGON (((100.8916 14...
5 PHRA NAKHON SI AYUTTHAYA MULTIPOLYGON (((100.5131 14...
6 ANG THONG MULTIPOLYGON (((100.3332 14...
The code chunk below will be used to update the attribute table of drug_use_sf and thailand_province_sf dataframe. This is performed by using left_join() of dplyr package.
combined_drug_use_sf <- left_join(thailand_province_sf, drug_use_sf,
by = c("ADM1_EN" = "province_en"))After performing the left_join() between thailand_province_sf and drug_use_sf, we can usesummary() to examine the resulting dataset. The output indicated that the combined dataset contains only 452 rows as compared to the original 462 rows for drug_use_sf.
This suggests that there were rows in drug_use_sf that did not successfully join with thailand_province_sf.
summary(combined_drug_use_sf) ADM1_EN fiscal_year types_of_drug_offenses no_cases
Length:452 Min. :2017 Length:452 Min. : 32.0
Class :character 1st Qu.:2018 Class :character 1st Qu.: 788.8
Mode :character Median :2020 Mode :character Median : 1399.5
Mean :2020 Mean : 1994.2
3rd Qu.:2021 3rd Qu.: 2444.5
Max. :2022 Max. :16480.0
NA's :2 NA's :2
geometry
MULTIPOLYGON :452
epsg:4326 : 0
+proj=long...: 0
2.3.2.2 Fixing the Mismatch in Province Names
The most likely reasons for this could be a mismatch in certain Province Names, hence to effectively identified the province names that do not match between the two datasets, we can utilize anti_join() to do so, finding the rows in each that did not manage to join in each dataset. This provides a straightforward comparison of the discrepancies between the two datasets.
# Get rows in drug_use_sf that didn't match with thailand_province_sf
unmatched_in_drug_use <- anti_join(drug_use_sf, thailand_province_sf,
by = c("province_en" = "ADM1_EN"))
# Get rows in thailand_province_sf that didn't match with drug_offenses_sf
unmatched_in_thailand_province <- anti_join(thailand_province_sf, drug_use_sf,
by = c("ADM1_EN" = "province_en"))
# Display mismatched values
unmatched_province_en <- unique(unmatched_in_drug_use$province_en)
unmatched_adm1_en <- unique(unmatched_in_thailand_province$ADM1_EN)
# View the mismatched sets
list("Unmatched in drug_use_sf" = unmatched_province_en,
"Unmatched in thailand_province_sf" = unmatched_adm1_en)$`Unmatched in drug_use_sf`
[1] "LOBURI" "BUOGKAN"
$`Unmatched in thailand_province_sf`
[1] "LOP BURI" "BUENG KAN"
As shown, there is a mismatch in the naming of the two provinces in each dataset although they are referring to the same provinces, hence we will need to standardise that. Since “Lopburi” and “Bueng Kan” were the names used in wikipedia, we will just be sticking with that.
This code performs standardization of province names in two datasets: drug_offenses_sf and thailand_province_sf, using recode() to change specific values.
# Update the values in the province_en column
drug_use_sf <- drug_use_sf %>%
mutate(province_en = recode(province_en,
"LOBURI" = "LOPBURI",
"BUOGKAN" = "BUENG KAN"))
# Update the values in the ADM1_EN column
thailand_province_sf <- thailand_province_sf %>%
mutate(ADM1_EN = recode(ADM1_EN,
"LOP BURI" = "LOPBURI",
"BUENG KAN" = "BUENG KAN"))2.3.2.3 Reattempt At Performing Relational Join
Finally, let us try to perform the relational join again. As seen, we have a total of 462 rows which aligns with the total number of rows that is in drug_offenses_sf and for us to easier refer to it, we will rename the column to province.
combined_drug_use_sf <- left_join(thailand_province_sf, drug_use_sf,
by = c("ADM1_EN" = "province_en")) %>%
rename(province = ADM1_EN)
summary(combined_drug_use_sf) province fiscal_year types_of_drug_offenses no_cases
Length:462 Min. :2017 Length:462 Min. : 32.0
Class :character 1st Qu.:2018 Class :character 1st Qu.: 798.2
Mode :character Median :2020 Mode :character Median : 1403.5
Mean :2020 Mean : 1981.7
3rd Qu.:2021 3rd Qu.: 2440.2
Max. :2022 Max. :16480.0
geometry
MULTIPOLYGON :462
epsg:4326 : 0
+proj=long...: 0
2.3 Drug Use Cases Visualisation
# Step 1: Aggregate data by year, summing the number of cases
drug_use_by_year <- combined_drug_use_sf %>% select(fiscal_year, no_cases) %>%
group_by(fiscal_year) %>%
summarise(total_cases = sum(no_cases), .groups = 'drop')
write_rds(drug_use_by_year, "data/rds/drug_use_by_year.rds")# Create the bar plot with exact labels
ggplot(drug_use_by_year, aes(x = factor(fiscal_year), y = total_cases)) + # Convert fiscal_year to factor
geom_col(fill = "red", color = "black") + # Use geom_col for the bar plot
geom_text(aes(label = total_cases), vjust = -0.5, size = 5) + # Add text labels on top of bars
labs(title = "Total Drug Use Cases by Year", x = "Year", y = "Total Cases") +
theme_minimal() +
scale_x_discrete(drop = FALSE) + # Ensures all years are shown on the x-axis
theme(plot.title = element_text(hjust = 0.5))2.4 Geographic Distribution of Drug Use Cases by Province and by Year
Using the tmap methods below, we can create chloropeth maps to visualise the geographic distribution of the drug use cases by province and by year.
# Set tmap mode to view or plot
tmap_mode("plot")
tm_shape(combined_drug_use_sf) +
tm_polygons("no_cases", title = "Drug Use Cases", style = "quantile", palette = "Reds", n=5) +
tm_facets(by = "fiscal_year", nrow = 2, ncol = 3) + # Adjust rows and columns as needed
tm_layout(
main.title = "Geographic Distribution of Drug Use Cases by Province and by Year", # Main title for the entire plot
legend.position = c("left", "top"),
legend.text.size = 1.2,
main.title.size = 1.5,
main.title.position = "center"
)3. Global Measures of Spatial Autocorrelation
# Filter for each year from 2017 to 2022
drug_use_2017_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2017)
drug_use_2018_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2018)
drug_use_2019_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2019)
drug_use_2020_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2020)
drug_use_2021_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2021)
drug_use_2022_sf <- combined_drug_use_sf %>% filter(fiscal_year == 2022)3.1 Computing Contiguity Spatial Weights (QUEEN)
Before calculating global spatial autocorrelation statistics, we first need to construct spatial weights for the study area. These weights define the neighborhood relationships between the geographical units (such as counties) within the study region. To do so, we will use st_conguity() of the sfdep package, the equivalent of poly2nb() function from the spdep package. This function creates a list of neighboring regions based on shared boundaries and is needed for when we conduct the subsequent tests.
The code chunk below is used to compute Queen contiguity weight matrix.
wm_q <- st_contiguity(thailand_province_sf, queen = TRUE)
write_rds(wm_q, "data/rds/wm_q.rds")
summary(wm_q)Neighbour list object:
Number of regions: 77
Number of nonzero links: 352
Percentage nonzero weights: 5.93692
Average number of links: 4.571429
1 region with no links:
67
2 disjoint connected subgraphs
Link number distribution:
0 1 2 3 4 5 6 7 8 9
1 1 5 17 15 17 10 5 4 2
1 least connected region:
14 with 1 link
2 most connected regions:
29 51 with 9 links
The most connected area has 9 neighbours. There are two area units with only one heighbours.
The summary report above shows that there are 77 area units in Thailand.
The least connected province is Trat, with only 1 neighbour.
The most connected provinces on the other hand, are Khon Kaen and Tak, with 9 neighbours.
A province with no neighbour is Phuket.
For reference on where these provinces of interest are.
Click to view the code
# Define the regions and corresponding colors
highlight_info <- c(
"TRAT" = "yellow",
"KHON KAEN" = "red",
"TAK" = "blue",
"PHUKET" = "green"
)
# Set tmap mode to plot
tmap_mode("plot") # Use "plot" for static maps
# Create the base map with all provinces
base_map <- tm_shape(thailand_province_sf) +
tm_polygons(col = "lightgray", title = "Province Highlight")
# Initialize combined map with the base map
combined_map <- base_map
# Loop through the highlight_info to add each region
for (region in names(highlight_info)) {
combined_map <- combined_map +
tm_shape(thailand_province_sf[thailand_province_sf$ADM1_EN == region, ]) +
tm_polygons(col = highlight_info[region],
alpha = 0.7) # No title needed here
}
# Manually add a legend using tm_add_legend
legend_elements <- lapply(names(highlight_info), function(region) {
list(label = region, col = highlight_info[region])
})
# Add layout options and custom legend
combined_map +
tm_add_legend(type = "fill",
labels = names(highlight_info),
col = highlight_info,
title = "Highlighted Provinces") +
tm_layout(
title = "Highlighted Provinces in Thailand",
legend.position = c("right", "bottom"),
title.position = c("center", "top"),
legend.title.size = 1.2, # Adjust legend title size
legend.text.size = 1 # Adjust legend text size
)To further investigate why Phuket has no neighboring provinces, we can examine the zoomed-in map of the region. The map clearly illustrates that Phuket is an island, situated away from the mainland. This geographical separation accounts for its lack of direct neighboring provinces.
Click to view the code
# Filter the dataset for Phuket province
phuket_sf <- thailand_province_sf %>%
filter(ADM1_EN == "PHUKET")
# Get bounding box for Phuket and slightly expand it to include surrounding provinces
bbox_phuket <- st_bbox(phuket_sf)
# Expand the bounding box by a certain factor (e.g., 0.1 degrees in each direction)
bbox_phuket_expanded <- bbox_phuket
bbox_phuket_expanded[1] <- bbox_phuket[1] - 0.1 # xmin
bbox_phuket_expanded[3] <- bbox_phuket[3] + 0.1 # xmax
bbox_phuket_expanded[2] <- bbox_phuket[2] + 0.1 # ymin (correcting direction)
bbox_phuket_expanded[4] <- bbox_phuket[4] + 0.1 # ymax
# Create a temporary dataset for plotting
phuket_plot_data <- thailand_province_sf %>%
mutate(is_phuket = ifelse(ADM1_EN == "PHUKET", "Phuket", "Other Provinces"))
# Set the color palette explicitly for Phuket and other provinces
color_palette <- c("Phuket" = "red", "Other Provinces" = "gray85")
# Plot using tmap and focus on the Phuket region and its surroundings
tmap_mode("plot") # Set tmap to static plotting mode
tm_shape(phuket_plot_data, bbox = bbox_phuket_expanded) +
tm_borders() + # Add borders of your spatial data
tm_fill(col = "is_phuket", palette = color_palette) + # Fill Phuket with red, others gray
tm_layout(title = "Phuket and Surrounding Provinces", frame = TRUE) # Zoom in to Phuket region3.2 Generating Spatial Weights
The chunk of code below utilizes the st_weights() function from the sfdep package to generate spatial weights for each neighboring polygon. Specifically, it constructs a weights matrix based on the contiguity defined by the wm_q object, allowing for the specification of weights style (here set to “W” for row-standardized weights). The allow_zero parameter is set to TRUE, which permits the inclusion of polygons with no neighbors in the analysis, ensuring that all spatial units are represented in the weights matrix.
# wm_q is our neighbors list created using st_contiguity
rswm_q <- st_weights(wm_q, style = "W", allow_zero = TRUE)3.3 Moran’s I
We will be using Moran’s I to compute global spatial autocorrelation instead of Geary’s C because Moran’s I provides a more comprehensive measure of spatial autocorrelation. While Geary’s C focuses on the squared differences between neighboring observations, which can make it sensitive to local variations and outliers, Moran’s I considers the overall correlation between values and their spatial locations. This allows us to capture both positive and negative spatial autocorrelation more effectively.
Furthermore, Moran’s I is well-suited for assessing patterns across larger regions, making it ideal for our analysis of spatial relationships of drug use across provinces in Thailand.
3.3.1 Moran’s I Test
The below line of code uses global_moran_test() of sfdep package to performs Moran’s I statistical testing
global_moran_test(drug_use_2018_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE)
Moran I test under randomisation
data: x
weights: listw
n reduced by no-neighbour observations
Moran I statistic standard deviate = 1.673, p-value = 0.04716
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.091283835 -0.013333333 0.003910294
global_moran_test(drug_use_2019_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE)
Moran I test under randomisation
data: x
weights: listw
n reduced by no-neighbour observations
Moran I statistic standard deviate = 2.1385, p-value = 0.01624
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.138046280 -0.013333333 0.005010889
global_moran_test(drug_use_2020_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE)
Moran I test under randomisation
data: x
weights: listw
n reduced by no-neighbour observations
Moran I statistic standard deviate = 1.382, p-value = 0.08349
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.087091664 -0.013333333 0.005280586
global_moran_test(drug_use_2021_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE)
Moran I test under randomisation
data: x
weights: listw
n reduced by no-neighbour observations
Moran I statistic standard deviate = 2.862, p-value = 0.002105
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.20376109 -0.01333333 0.00575372
global_moran_test(drug_use_2022_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE)
Moran I test under randomisation
data: x
weights: listw
n reduced by no-neighbour observations
Moran I statistic standard deviate = 2.7688, p-value = 0.002813
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.197931897 -0.013333333 0.005822019
3.3.2 Computing Monte Carlo Moran’s I
In the code chunk below, global_moran_perm() of sfdep is used to perform permutation test for Moran’s I statistic. A total of 1000 simulation will be performed. We will also be using seed() to ensure the reproducibility of our code.
set.seed(1234)
gmc_i_2017<-global_moran_perm(drug_use_2017_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE, nsim=999)
gmc_i_2017
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = 0.077263, observed rank = 922, p-value = 0.156
alternative hypothesis: two.sided
set.seed(1234)
gmc_i_2018<-global_moran_perm(drug_use_2018_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE, nsim=999)
gmc_i_2018
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = 0.091284, observed rank = 944, p-value = 0.112
alternative hypothesis: two.sided
set.seed(1234)
gmc_i_2019<-global_moran_perm(drug_use_2019_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE, nsim=999)
gmc_i_2019
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = 0.13805, observed rank = 972, p-value = 0.056
alternative hypothesis: two.sided
set.seed(1234)
gmc_i_2020<-global_moran_perm(drug_use_2020_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE, nsim=999)
gmc_i_2020
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = 0.087092, observed rank = 924, p-value = 0.152
alternative hypothesis: two.sided
set.seed(1234)
gmc_i_2021<-global_moran_perm(drug_use_2021_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE, nsim=999)
gmc_i_2021
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = 0.20376, observed rank = 993, p-value = 0.014
alternative hypothesis: two.sided
set.seed(1234)
gmc_i_2022<-global_moran_perm(drug_use_2022_sf$no_cases,wm_q,rswm_q, zero.policy = TRUE, nsim=999)
gmc_i_2022
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = 0.19793, observed rank = 994, p-value = 0.012
alternative hypothesis: two.sided
3.4 Visualising Monte Carlo Moran’s I
In order to examine the simulated Moran’s I test statistics in greater detail. We can do so by plotting the distribution of the statistical values as a histogram by using the code chunk below.
In the code chunk below hist() and abline() of R Graphics are used.
3.4.1 Descriptive Statistics of Simulated Moran’s I
# 2017
print(summary(gmc_i_2017$res[1:999])) Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.14280 -0.05100 -0.02108 -0.01248 0.02061 0.20105
# 2018
print(summary(gmc_i_2018$res[1:999])) Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.14516 -0.05259 -0.01839 -0.01148 0.02319 0.26485
# 2019
print(summary(gmc_i_2019$res[1:999])) Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.15737 -0.05925 -0.01691 -0.01110 0.02752 0.31845
# 2020
print(summary(gmc_i_2020$res[1:999])) Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.15929 -0.06245 -0.02221 -0.01443 0.02685 0.33025
# 2021
print(summary(gmc_i_2021$res[1:999])) Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.23519 -0.06342 -0.01758 -0.01355 0.03012 0.29476
# 2022
print(summary(gmc_i_2022$res[1:999])) Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.21825 -0.05910 -0.01462 -0.01072 0.03411 0.30446
# Calculating variances and storing them in variables
gmc_i_var_2017 <- var(gmc_i_2017$res[1:999])
gmc_i_var_2018 <- var(gmc_i_2018$res[1:999])
gmc_i_var_2019 <- var(gmc_i_2019$res[1:999])
gmc_i_var_2020 <- var(gmc_i_2020$res[1:999])
gmc_i_var_2021 <- var(gmc_i_2021$res[1:999])
gmc_i_var_2022 <- var(gmc_i_2022$res[1:999])
# Printing the variances
cat(sprintf("Variance of '2017': %f\n", gmc_i_var_2017))Variance of '2017': 0.003137
cat(sprintf("Variance of '2018': %f\n", gmc_i_var_2018))Variance of '2018': 0.003527
cat(sprintf("Variance of '2019': %f\n", gmc_i_var_2019))Variance of '2019': 0.004564
cat(sprintf("Variance of '2020': %f\n", gmc_i_var_2020))Variance of '2020': 0.004721
cat(sprintf("Variance of '2021': %f\n", gmc_i_var_2021))Variance of '2021': 0.005436
cat(sprintf("Variance of '2022': %f\n", gmc_i_var_2022))Variance of '2022': 0.005386
3.4.2 Histogram of Simulated Values
Click to view the code
# Set up the plotting area for 2 columns and 3 rows
par(mfrow = c(3, 2))
# Histogram for 2017
hist(gmc_i_2017$res,
freq = TRUE,
breaks = 20,
xlab = "Simulated Moran's I",
main = "Histogram for 2017")
abline(v = 0, col = "red")
# Histogram for 2018
hist(gmc_i_2018$res,
freq = TRUE,
breaks = 20,
xlab = "Simulated Moran's I",
main = "Histogram for 2018")
abline(v = 0, col = "red")
# Histogram for 2019
hist(gmc_i_2019$res,
freq = TRUE,
breaks = 20,
xlab = "Simulated Moran's I",
main = "Histogram for 2019")
abline(v = 0, col = "red")
# Histogram for 2020
hist(gmc_i_2020$res,
freq = TRUE,
breaks = 20,
xlab = "Simulated Moran's I",
main = "Histogram for 2020")
abline(v = 0, col = "red")
# Histogram for 2021
hist(gmc_i_2021$res,
freq = TRUE,
breaks = 20,
xlab = "Simulated Moran's I",
main = "Histogram for 2021")
abline(v = 0, col = "red")
# Histogram for 2022
hist(gmc_i_2022$res,
freq = TRUE,
breaks = 20,
xlab = "Simulated Moran's I",
main = "Histogram for 2022")
abline(v = 0, col = "red")